Synopisis

This analysis looks at San Francisco crime data. It addresses the questions:
- Does the number of crimes show obvious day to day variation, particularly between weekdays and weekends?
- How does crime vary day to day on a per district basis?
- Can we see hotspots where crimes are most prevalent in San Francisco? - Do particular crimes happen more frequenly at different times of day?

This data suggest that police patrols can be optimized to specific districts and locations to focus on particular crimes.

Get Data

There were 1 files found in the data directory /Users/winstonsaunders/Documents/Crime_Visualization_Challenge.

## 'data.frame':    32921 obs. of  12 variables:
##  $ IncidntNum: int  140622186 140741225 140593098 140644839 146195066 140662825 140549580 140562902 140676343 140556585 ...
##  $ Category  : Factor w/ 36 levels "ARSON","ASSAULT",..: 17 17 33 17 33 21 2 2 17 22 ...
##  $ Descript  : Factor w/ 418 levels "ABANDONMENT OF CHILD",..: 206 205 245 278 248 196 124 74 277 154 ...
##  $ DayOfWeek : Factor w/ 7 levels "Monday","Tuesday",..: 6 3 6 7 7 6 3 1 1 6 ...
##  $ Date      : Date, format: "2014-07-26" "2014-09-03" ...
##  $ Time      : num  20 9 18 11 14 7 16 13 9 9 ...
##  $ PdDistrict: Factor w/ 10 levels "BAYVIEW","CENTRAL",..: 2 7 9 8 2 6 1 8 9 6 ...
##  $ Resolution: Factor w/ 16 levels "ARREST, BOOKED",..: 12 12 12 12 12 12 12 1 12 2 ...
##  $ Address   : Factor w/ 8867 levels "0.0 Block of 10TH ST",..: 4423 4821 2119 2676 4889 7847 5532 5536 1546 6336 ...
##  $ X         : num  -122 -122 -122 -122 -122 ...
##  $ Y         : num  37.8 37.8 37.8 37.8 37.8 ...
##  $ Location  : Factor w/ 13201 levels "(37.7080829769597, -122.419241455854)",..: 11424 6975 4396 13200 11415 6925 686 10004 5322 7256 ...

The above shows the structure of the data. There are statistics on 32921 crimes in the file datafile.

Analysis

Question 1: Does the number of crimes show obvious day to day variation, particularly between weekdays and weekends?

plot of chunk plot of data

Crime rates seem to show variation with the day of the week, but the correlation appears to depend on the district. For instance the Central and Southern districts are higher on the weekend whereas Taraval and Richmond appear to show little change.

Question 2: How does crime vary day to day on a per district basis?

plot of chunk plot of data2

This is more interesting. Each district has some pretty unique variation. Some of the more interesting ones are listed below.
- Bayview is mostly flat, but seems to show a higher rate on Friday nights.
- Central shows a strong upward trend on the weekends, with Friday and Saturday night showing about 20% increase in crime.
- Mission while having an overall farily high crime rate, shows little variation.
- Tenderloin shows an apparent drop in the crime rate.

Question 3: Does the leading type of crime vary by district?

Observing the variability of crime by district its natural to ask whether the nature of crimes show any district by district distinction. The easiest way to get at this is to just pull the data aprt by district and sort. First let’s just look citywide.

##                  SF
## LARCENY/THEFT  9262
## OTHER OFFENSES 4241
## NON-CRIMINAL   3846
## ASSAULT        2691
## VANDALISM      1775
## VEHICLE THEFT  1762
## [1] "TENDERLOIN"
##                ctable
## LARCENY/THEFT     472
## NON-CRIMINAL      318
## OTHER OFFENSES    308
## ASSAULT           302
## DRUG/NARCOTIC     291
## [1] "MISSION"
##                ctable
## LARCENY/THEFT     657
## OTHER OFFENSES    589
## NON-CRIMINAL      490
## ASSAULT           413
## WARRANTS          297
## [1] "NORTHERN"
##                ctable
## LARCENY/THEFT    1284
## NON-CRIMINAL      416
## OTHER OFFENSES    401
## ASSAULT           279
## DRUG/NARCOTIC     190
## [1] "RICHMOND"
##                ctable
## LARCENY/THEFT     560
## NON-CRIMINAL      248
## OTHER OFFENSES    219
## VANDALISM         111
## VEHICLE THEFT     100

This detail starts to show some of the richness of the data. For instance in the Mission District while Larceny/Theft is the most prevalent item, assualt and drugs/narcotic violations together account for more total crime than the does Larceny/Theft.
In the Richmond District by contrast, Assault is not among the top six items, while vandalism and vehicle theft together account for less than half of the leading crime, again Larceny/Theft.

Hence, although the leading type of crime does not vary by district, the top crimes shows marked variation with district.

Mapping Crime: Can we see hotspots where crimes are most prevalent in San Francisco?

Here the hypothesis is that we can see crime “hot spots” by plotting them geographically. This takes advatnage of the farily precise location data. To speed up analysis I’ve chosen to focus only an a few “top” crimes from the lists above. Namely Larceny/Theft, Vehicle Theft, Assault, and Vandalism.

plot of chunk map_it

We can see hotspots where crimes are more prevalent.

The Map shows locations of crimes,
red data points correpond to thefts: these appear to be loaclized to mainly tourist areas.
blue data points representing Assault appear localized in the Tenderloin, Mission, adn Broadway areas.
DarkGreen data points representing Vehicle Theft are more spread across the City but appear most prevalent in residential areas.

Do particular crimes happen more frequenly at different times of day?

plot of chunk by_time_of_dayplot of chunk by_time_of_dayplot of chunk by_time_of_dayplot of chunk by_time_of_day

Rather interestingly particular crimes seem to show distinct time behavior. For instance Theft and Larceny appear to be low during morning hours, but peak around 6 pm. Combining this with teh above information suggests that, for instance, police patrols would be most effective in tourist areas around 6 pm to about 8 pm, after which it drops off.

Patrols for Assualt, which is also well localized, need to be in place most of the day. The rate rises rapildy from morning hours and then plateaus for most of the day, falling off again only after midnight.

Vehicle theft and vandalism, on the other hand, pickup only after about 6 pm and drop off, apparently, after midnight, suggesting polic patrols for these crimes need to focus primarily on evening hours.

Conclusions

THis quick exploratory analysis found that crime frequency and type vary strongly by location in the city and also by time of day. Crime hotspots, like the Tenderloin, require patrols for Assault throughout the day. Partols for theft adn larceny, which occur mostly in tourist areas, occur primarily around 6 pm to midnight, with a rapid decrease afterward.

More ananlysis is recommended. For instance I noticed on teh SFPD website that several years of data could be downloaded. It woudl be interesting to correlate this data with changes in vagrancy laws and police patroling strategies to see what has been effective in reducing crime.

There are some inherent systematic assumptions in this data which also need further assessment. For instance the data here correlate to police action, not by the crimes themselves. Unreported crimes are not reflected in this analysis. Also, while the data looks at the frequency of police action, it does not accurately represent the hazard rate for an individual since population density is not accounted for (for instance just because there are more crimes in the Broadway district does not necessarily reflect a signficant increase in danger for an individual, since this is an area with a signficant nightlife population).